ABBS article: DNAskew for Base Compositional Asymmetry and Replication Boundaries in the Genome Sequences

http://www.abbs.info E-mail:[email protected]
ISSN 0582-9879 Acta Biochim et Biophysica Sinica 2004, 36(1):016-020 CN 31-1300/Q

DNAskew: Statistical Analysis of Base Compositional Asymmetry and Prediction of Replication Boundaries in the Genome Sequences

Xiang-Ru MA, Shao-Bo XIAO, Ai-Zhen GUO, Jian-Qiang LÜ, and Huan-Chun CHEN*

Laboratory of Animal Virology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan 430070, China

Abstract Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution, in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

Key words strand asymmetry; base composition; statistics analysis; replication origin; bioinformatics

-----------------

Received: July 8, 2003 Accepted: October 29, 2003

This work was supported by a grant from the National High Technology Research and Development Program of China (863 Program) (No. 2001AA213051)

*Corresponding author: Tel, 86-27-87282608; E-mail, [email protected]